SpringAI — Observability（可观测性）

概述

Spring AI 基于 Spring 生态系统的可观测性功能，提供对 AI 相关操作的深入洞察。它为以下核心组件提供指标（Metrics）和链路追踪（Tracing）能力：

ChatClient（包括 Advisor）
ChatModel
EmbeddingModel
ImageModel
VectorStore

请参考 Spring Boot Metrics 和 Spring Boot Tracing 文档，以在应用中启用指标和追踪支持。

ChatClient 观测（Observations）

当调用 ChatClient 的 call() 或 stream() 操作时，会记录 spring.ai.chat.client 观测数据。它们衡量执行调用所花费的时间，并传播相关的追踪信息。

通用属性

属性	描述
`gen_ai.operation.name`	始终为 `framework`。
`gen_ai.system`	始终为 `spring_ai`。
`spring.ai.chat.client.stream`	聊天模型响应是否为流 —— `true` 或 `false`。
`spring.ai.kind`	Spring AI 中框架 API 的类型：`chat_client`。

附加属性

名称	描述
`spring.ai.chat.client.advisors`	已配置的聊天客户端 Advisor 列表。
`spring.ai.chat.client.conversation.id`	使用聊天记忆时的对话标识符。
`spring.ai.chat.client.tool.names`	传递给聊天客户端的工具名称。

日志配置

ChatClient 的 Prompt 和 Completion 数据通常体积较大，且可能包含敏感信息。出于这些原因，默认不会导出。

Spring AI 支持记录 Prompt 和 Completion 数据，以帮助调试和排查问题。

配置项	描述	默认值
`spring.ai.chat.client.observations.log-prompt`	是否记录聊天客户端 Prompt 内容。	`false`
`spring.ai.chat.client.observations.log-completion`	是否记录聊天客户端 Completion 内容。	`false`

Advisor 观测

当 Advisor 被执行时，会记录 spring.ai.advisor 观测数据。它们衡量在 Advisor 中花费的时间（包括内部 Advisor 的时间），并传播相关的追踪信息。

属性	描述
`gen_ai.operation.name`	始终为 `framework`。
`gen_ai.system`	始终为 `spring_ai`。
`spring.ai.advisor.name`	Advisor 的名称。
`spring.ai.kind`	Spring AI 中框架 API 的类型：`advisor`。
`spring.ai.advisor.order`	Advisor 在链中的顺序。

ChatModel 观测

OpenAI 和 Anthropic 的聊天模型都会发出 HTTP 层的观测数据（聊天模型层 + HTTP 层），流式调用有一个特定限制：

同步调用：okhttp.requests Span 正确地嵌套在 gen_ai.client.operation Span 之下。
流式调用：HTTP Span 会被记录，但不会作为聊天模型 Span 的子 Span —— SDK 的异步流式路径在调用 Spring AI 的 HTTP 客户端之前跳转到 ForkJoinPool.commonPool()，在该边界处丢弃了调用线程的观测上下文。

详情请参见 OpenAI 聊天文档和 Anthropic 聊天文档。

gen_ai.client.operation 观测

调用 ChatModel 的 call 或 stream 方法时记录。衡量方法执行完成所花费的时间，并传播相关的追踪信息。

属性	描述
`gen_ai.operation.name`	正在执行的操作名称。
`gen_ai.system`	客户端插桩识别的模型提供商。
`gen_ai.request.model`	请求所指向的模型名称。
`gen_ai.response.model`	生成响应的模型名称。
`gen_ai.request.frequency_penalty`	模型请求的频率惩罚设置。
`gen_ai.request.max_tokens`	模型为一次请求生成的最大 Token 数。
`gen_ai.request.presence_penalty`	模型请求的存在惩罚设置。
`gen_ai.request.stop_sequences`	模型用于停止生成更多 Token 的序列列表。
`gen_ai.request.stream`	请求是否以流式模式发出。仅在为 `true` 时存在。
`gen_ai.request.temperature`	模型请求的温度设置。
`gen_ai.request.top_k`	模型请求的 top_k 采样设置。
`gen_ai.request.top_p`	模型请求的 top_p 采样设置。
`gen_ai.response.finish_reasons`	模型停止生成 Token 的原因，对应每个收到的生成结果。
`gen_ai.response.id`	AI 响应的唯一标识符。
`gen_ai.usage.cache_creation.input_tokens`	写入提供商管理缓存的输入 Token 数。
`gen_ai.usage.cache_read.input_tokens`	从提供商管理缓存中提供服务的输入 Token 数。
`gen_ai.usage.input_tokens`	模型输入（Prompt）中使用的 Token 数。
`gen_ai.usage.output_tokens`	模型输出（Completion）中使用的 Token 数。
`gen_ai.usage.total_tokens`	模型交互中使用的总 Token 数。
`spring.ai.model.request.tool.names`	请求中提供给模型的工具定义列表。

日志配置

聊天 Prompt 和 Completion 数据通常体积较大，且可能包含敏感信息。出于这些原因，默认不会导出。

Spring AI 支持记录聊天 Prompt 和 Completion 数据，在排查问题时非常有用。当启用追踪时，日志将包含追踪信息以便更好地关联。

配置项	描述	默认值
`spring.ai.chat.observations.log-prompt`	记录 Prompt 内容。`true` 或 `false`。	`false`
`spring.ai.chat.observations.log-completion`	记录 Completion 内容。`true` 或 `false`。	`false`
`spring.ai.chat.observations.include-error-logging`	在观测中包含错误日志。`true` 或 `false`。	`false`

Tool Calling 观测

在聊天模型交互的上下文中执行工具调用时，会记录 spring.ai.tool 观测数据。它们衡量工具调用完成所花费的时间，并传播相关的追踪信息。

属性	描述
`gen_ai.operation.name`	始终为 `execute_tool`。
`gen_ai.system`	负责该操作的提供商。始终为 `spring_ai`。
`spring.ai.kind`	Spring AI 执行的操作类型。始终为 `tool_call`。
`spring.ai.tool.definition.name`	工具的名称。
`spring.ai.tool.type`	工具的类型。默认为 `function`。

附加属性

名称	描述
`spring.ai.tool.definition.description`	工具的描述。
`spring.ai.tool.definition.schema`	调用工具所用参数的 Schema。
`spring.ai.tool.call.id`	工具调用的 ID，由聊天模型识别。
`spring.ai.tool.call.arguments`	工具调用的输入参数。（仅当启用时）
`spring.ai.tool.call.result`	工具调用执行的结果。（仅当启用时）

工具调用的输入参数和结果默认不会导出，因为它们可能包含敏感信息。

Spring AI 支持将工具调用参数和结果数据作为 Span 属性导出。

配置项	描述	默认值
`spring.ai.tools.observations.include-content`	在观测中包含工具调用内容。`true` 或 `false`。	`false`

EmbeddingModel 观测

调用 Embedding 模型方法时记录 gen_ai.client.operation 观测数据。衡量方法执行完成所花费的时间，并传播相关的追踪信息。

属性	描述
`gen_ai.operation.name`	正在执行的操作名称。
`gen_ai.system`	客户端插桩识别的模型提供商。
`gen_ai.request.model`	请求所指向的模型名称。
`gen_ai.response.model`	生成响应的模型名称。
`gen_ai.request.embedding.dimensions`	生成的输出嵌入的维度数量。
`gen_ai.usage.input_tokens`	模型输入中使用的 Token 数。
`gen_ai.usage.total_tokens`	模型交互中使用的总 Token 数。

ImageModel 观测

调用图像模型方法时记录 gen_ai.client.operation 观测数据。衡量方法执行完成所花费的时间，并传播相关的追踪信息。

属性	描述
`gen_ai.operation.name`	正在执行的操作名称。
`gen_ai.system`	客户端插桩识别的模型提供商。
`gen_ai.request.model`	请求所指向的模型名称。
`gen_ai.request.image.response_format`	生成图像的返回格式。
`gen_ai.request.image.size`	要生成的图像尺寸（例如 `1024x1024`）。
`gen_ai.request.image.style`	要生成的图像风格。

日志配置

图像 Prompt 数据通常体积较大，且可能包含敏感信息。出于这些原因，默认不会导出。

Spring AI 支持记录图像 Prompt 数据，在排查问题时非常有用。当启用追踪时，日志将包含追踪信息以便更好地关联。

配置项	描述	默认值
`spring.ai.image.observations.log-prompt`	记录图像 Prompt 内容。`true` 或 `false`。	`false`

VectorStore 观测

Spring AI 中的所有向量存储实现都通过 Micrometer 插桩，以提供指标和分布式追踪数据。

db.vector.client.operation 观测

与向量存储交互时记录。衡量在 query、add 和 remove 操作上花费的时间，并传播相关的追踪信息。

属性	描述
`db.operation.name`	正在执行的操作或命令名称。为 `add`、`delete` 或 `query` 之一。
`db.system`	客户端插桩识别的数据库管理系统（DBMS）产品。为 `pg_vector`、`azure`、`cassandra`、`chroma`、`elasticsearch`、`milvus`、`neo4j`、`opensearch`、`qdrant`、`redis`、`typesense`、`weaviate`、`pinecone`、`oracle`、`mongodb`、`gemfire`、`simple` 之一。
`spring.ai.kind`	Spring AI 中框架 API 的类型：`vector_store`。
`db.collection.name`	数据库中的集合（表、容器）名称。
`db.namespace`	数据库名称，在服务器地址和端口范围内完全限定。
`db.search.similarity_metric`	相似度搜索中使用的度量标准。
`db.vector.dimension_count`	向量的维度。
`db.vector.field_name`	向量所在的字段名称（例如某个字段名）。
`db.vector.query.content`	正在执行的搜索查询内容。
`db.vector.query.filter`	搜索查询中使用的元数据过滤器。
`db.vector.query.response.documents`	相似度搜索查询返回的文档。可选。
`db.vector.query.similarity_threshold`	接受所有搜索分数的相似度阈值。阈值为 `0.0` 表示接受任何相似度或禁用相似度阈值过滤；阈值为 `1.0` 表示需要完全匹配。
`db.vector.query.top_k`	查询返回的前 k 个最相似向量。

日志配置

向量搜索响应数据通常体积较大，且可能包含敏感信息。出于这些原因，默认不会导出。

Spring AI 支持记录向量搜索响应数据，在排查问题时非常有用。当启用追踪时，日志将包含追踪信息以便更好地关联。

配置项	描述	默认值
`spring.ai.vectorstore.observations.log-query-response`	记录向量存储查询响应内容。`true` 或 `false`。	`false`

Prometheus 指标参考

本节记录 Spring AI 组件以 Prometheus 格式发出的指标。

Spring AI 使用 Micrometer。基础指标名称使用点号分隔（例如 gen_ai.client.operation），Prometheus 导出时会转换为下划线并附加标准后缀：

计时器（Timers） → _seconds_count、_seconds_sum、_seconds_max，以及（当支持时）_active_count
计数器（Counters） → _total（单调）

基础指标名称到 Prometheus 时间序列的映射

gen_ai.client.operation

gen_ai_client_operation_seconds_count
gen_ai_client_operation_seconds_sum
gen_ai_client_operation_seconds_max
gen_ai_client_operation_active_count

db.vector.client.operation

db_vector_client_operation_seconds_count
db_vector_client_operation_seconds_sum
db_vector_client_operation_seconds_max
db_vector_client_operation_active_count

详细指标说明

ChatClient 指标

指标名称	类型	单位	描述
`gen_ai_chat_client_operation_seconds_sum`	Timer	seconds	ChatClient 操作（call/stream）花费的总时间
`gen_ai_chat_client_operation_seconds_count`	Counter	count	已完成的 ChatClient 操作数量
`gen_ai_chat_client_operation_seconds_max`	Gauge	seconds	观测到的 ChatClient 操作最大持续时间
`gen_ai_chat_client_operation_active_count`	Gauge	count	当前正在执行的 ChatClient 操作数量

Active vs Completed：active_count 显示正在进行的调用；_seconds 系列仅反映已完成的调用。

ChatModel 指标

指标名称	类型	单位	描述
`gen_ai_client_operation_seconds_sum`	Timer	seconds	执行聊天模型操作的总时间
`gen_ai_client_operation_seconds_count`	Counter	count	已完成的聊天模型操作数量
`gen_ai_client_operation_seconds_max`	Gauge	seconds	观测到的聊天模型操作最大持续时间
`gen_ai_client_operation_active_count`	Gauge	count	当前正在执行的聊天模型操作数量
`gen_ai_client_token_usage_total`	Counter	tokens	消耗的总 Token 数，按 Token 类型标记

Token 类型标签：

标签值	描述
`gen_ai_token_type=input`	发送给模型的 Prompt Token
`gen_ai_token_type=output`	模型返回的 Completion Token
`gen_ai_token_type=total`	输入 + 输出

VectorStore 指标

指标名称	类型	单位	描述
`db_vector_client_operation_seconds_sum`	Timer	seconds	向量存储操作（add/delete/query）花费的总时间
`db_vector_client_operation_seconds_count`	Counter	count	已完成的向量存储操作数量
`db_vector_client_operation_seconds_max`	Gauge	seconds	观测到的向量存储操作最大持续时间
`db_vector_client_operation_active_count`	Gauge	count	当前正在执行的向量存储操作数量

常用标签：

标签	描述
`db_operation_name`	操作类型（`add`、`delete`、`query`）
`db_system`	向量数据库/提供商（`redis`、`chroma`、`pgvector` 等）
`spring_ai_kind`	`vector_store`

关键概念

Active（*_active_count） —— 正在进行的操作的瞬时 Gauge（并发/负载）。
Completed（*_seconds_sum | count | max） —— 已完成操作的统计：
- _seconds_sum / _seconds_count → 平均延迟
- _seconds_max → 自上次抓取以来的高水位线（取决于注册表行为）